Students from Department of Mathematics and Statistics (DMS) in University DAR (UDAR) study one of the following three-year Bachelor programmes:
1) The tuition fee for every trimester is between RM3,000 - RM4,000.
2) The administration fee for each year is RM1,500 - RM2,000.
1) Scholarship A
2) Scholarship B
3) Financial Assistance
A fund with total amount of RM1,000,000 is created for new students that enrol in May 2021. Hence, the new student must have completed the study of STPM/A-level/UDAR Foundation/etc before enrolling in degree courses above.
1) Pivot Table
3) Model
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.io as pio
import plotly.express as px
import warnings
import os
warnings.filterwarnings("ignore")
#practice only
dir = []
for i, j, ks in os.walk(r"C:/Users/USER/Desktop/DAV Assignment"):
for k in ks:
dir.append(os.path.join(i, k))
print(os.path.join(i, k))
C:/Users/USER/Desktop/DAV Assignment\Dataset.csv C:/Users/USER/Desktop/DAV Assignment\testing777.csv C:/Users/USER/Desktop/DAV Assignment\testing778.csv C:/Users/USER/Desktop/DAV Assignment\testing800.csv C:/Users/USER/Desktop/DAV Assignment\testing801.csv
#practice only
a1,a2,a3,a4,a5 = dir
df = pd.read_csv(a1)
df2 = pd.read_csv(a2)
df3 = pd.read_csv(a3)
df4 = pd.read_csv(a4)
df5 = pd.read_csv(a5)
temp_age = df4[df4["Age"] > 25].index.tolist()
for i in temp_age:
df4.loc[i, "Age"] = np.random.choice([18]*8 + [19]*5 + [20]*4 + [21]*2 + [22])
df4.loc[64, "Family_income"] = "Less than RM3500"
temp_list22 = df4[df4["Family_income"] == "other"].index.tolist()
for i in temp_list22:
df4.loc[i, "Family_income"] = "Other"
df4.to_csv("~/Desktop/DAV Assignment/Dataset.csv")
df.drop(["Unnamed: 0", "Unnamed: 0.1"], 1, inplace = True)
df.columns
Index(['Applicant_name', 'Age', 'CGPA', 'Family_income', 'School',
'1st_sem_tuition_fee', 'course', 'Parent_marriage', 'Status'],
dtype='object')
categories = ["Family_income", "School", "course", "Parent_marriage", "Status"]
values = ["CGPA", "1st_sem_tuition_fee"]
df.describe()
| Age | CGPA | 1st_sem_tuition_fee | |
|---|---|---|---|
| count | 513.000000 | 513.000000 | 513.000000 |
| mean | 20.038986 | 2.811404 | 5518.518519 |
| std | 2.284125 | 0.420336 | 293.929713 |
| min | 18.000000 | 2.106000 | 5000.000000 |
| 25% | 18.000000 | 2.473000 | 5300.000000 |
| 50% | 19.000000 | 2.762000 | 5500.000000 |
| 75% | 22.000000 | 3.060000 | 5800.000000 |
| max | 25.000000 | 4.000000 | 6000.000000 |
for i in categories:
df[i] = df[i].astype("category")
print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 513 entries, 0 to 512 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Applicant_name 513 non-null object 1 Age 513 non-null int64 2 CGPA 513 non-null float64 3 Family_income 513 non-null category 4 School 513 non-null category 5 1st_sem_tuition_fee 513 non-null float64 6 course 513 non-null category 7 Parent_marriage 513 non-null category 8 Status 513 non-null category dtypes: category(5), float64(2), int64(1), object(1) memory usage: 20.3+ KB None
df_new = pd.read_csv("~/Desktop/DAV Assignment/testing801.csv")
temp_new = df_new["Education"].tolist()
df["Education"] = temp_new
df = df.iloc[:, [0,1,3,7,4,9,6,2,5,8]]
df
| Applicant_name | Age | Family_income | Parent_marriage | School | Education | course | CGPA | 1st_sem_tuition_fee | Status | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alex lo | 21 | Other | married | Kuching High School | STPM | Computer Science | 3.143 | 5800.0 | 50% scholarship |
| 1 | Kyrie li | 18 | Other | married | Kuala Selangor High School | STPM | Mechanical Engineering | 3.079 | 5200.0 | 50% scholarship |
| 2 | Albert lau | 19 | Less than RM3500 | married | Kajang High School | UEC | Civil Engineering | 2.387 | 5600.0 | No financial asistance |
| 3 | Eric koh | 18 | Less than RM3500 | married | Kota Kinabalu High School | STPM | Mechanical Engineering | 2.633 | 5900.0 | No financial asistance |
| 4 | Andy zhou | 23 | Less than RM1500 | divorced | Lake Gardens High School | UDAC Foundation | Applied Mathematics with Computing | 2.995 | 5500.0 | Financial assistance |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 508 | Angela lim | 18 | Other | married | Malacca High School | UEC | Financial Mathematics | 2.961 | 5100.0 | No financial asistance |
| 509 | Stephen chong | 18 | Less than RM3500 | married | Kajang High School | UEC | Statistics | 2.981 | 5800.0 | No financial asistance |
| 510 | Jenifer chen | 18 | Less than RM3500 | married | Kota Kinabalu High School | UDAC Foundation | Electrical & Electronic Engineering | 2.606 | 5200.0 | No financial asistance |
| 511 | Kevin Goh | 20 | Less than RM1500 | married | Malacca Chinese High School | UDAC Foundation | Mechanical Engineering | 3.188 | 5800.0 | 50% scholarship |
| 512 | Kyrie liew | 23 | Less than RM1500 | married | Foon Yew High School | UEC | Mechanical Engineering | 3.198 | 5500.0 | 50% scholarship |
513 rows × 10 columns
def pivot_table(df111, index = None, columns = None, values = ["Age", "CGPA"], aggfunc = {"Age":["mean", "count", "max", "min"],
"CGPA":["mean", "count", "max", "min"]}):
pv = pd.pivot_table(df111, index = index, columns = columns, values = values, aggfunc = aggfunc)
return pv
df_math = df.query('course in ["Applied Mathematics with Computing", "Financial Mathematics", "Statistics"]')
df_math = df_math.reset_index().drop("index", 1)
df_math
| Applicant_name | Age | Family_income | Parent_marriage | School | Education | course | CGPA | 1st_sem_tuition_fee | Status | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Andy zhou | 23 | Less than RM1500 | divorced | Lake Gardens High School | UDAC Foundation | Applied Mathematics with Computing | 2.995 | 5500.0 | Financial assistance |
| 1 | Sam ong | 20 | Less than RM1500 | married | Chinese High School (Batu Pahat) | UDAC Foundation | Statistics | 3.240 | 5500.0 | 50% scholarship |
| 2 | May li | 18 | Less than RM3500 | married | Kuching High School | UDAC Foundation | Financial Mathematics | 3.068 | 5900.0 | 50% scholarship |
| 3 | Jason loh | 21 | Other | married | Sabah Chinese High School | UDAC Foundation | Applied Mathematics with Computing | 2.264 | 6000.0 | No financial asistance |
| 4 | Cindy chia | 18 | Less than RM1500 | married | Kota Kinabalu High School | STPM | Statistics | 2.739 | 5100.0 | Financial assistance |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99 | Jenifer wong | 22 | Less than RM3500 | married | Kuala Selangor High School | UDAC Foundation | Financial Mathematics | 2.670 | 5500.0 | No financial asistance |
| 100 | Anthony lau | 20 | Less than RM3500 | married | Sarikei High School | STPM | Applied Mathematics with Computing | 2.709 | 5600.0 | No financial asistance |
| 101 | Cindy chong | 18 | Other | married | Bukit Mertajam High School | A-Level | Financial Mathematics | 2.411 | 5500.0 | No financial asistance |
| 102 | Angela lim | 18 | Other | married | Malacca High School | UEC | Financial Mathematics | 2.961 | 5100.0 | No financial asistance |
| 103 | Stephen chong | 18 | Less than RM3500 | married | Kajang High School | UEC | Statistics | 2.981 | 5800.0 | No financial asistance |
104 rows × 10 columns
pv1_all = pivot_table(df, index = "course")
pv1_math = pivot_table(df_math, index = "course").dropna()
pv2_all = pivot_table(df, index = "School")
pv2_math = pivot_table(df_math, index = "School")
pv3_all = pivot_table(df, index = "Family_income")
pv3_math = pivot_table(df_math, index = "Family_income")
pv4_all = pivot_table(df, index = "Parent_marriage")
pv4_math = pivot_table(df_math, index = "Parent_marriage")
pv5_all = pivot_table(df, index = "Status")
pv5_math = pivot_table(df_math, index = "Status")
pv1_all
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| course | ||||||||
| Applied Mathematics with Computing | 41.0 | 25.0 | 20.731707 | 18.0 | 41 | 3.978 | 2.849000 | 2.176000 |
| Chemical Engineering | 46.0 | 24.0 | 19.826087 | 18.0 | 46 | 3.593 | 2.774493 | 2.284792 |
| Civil Engineering | 41.0 | 25.0 | 20.121951 | 18.0 | 41 | 3.902 | 2.784585 | 2.108000 |
| Computer Engineering | 46.0 | 25.0 | 20.543478 | 18.0 | 46 | 3.914 | 2.804670 | 2.133000 |
| Computer Science | 92.0 | 25.0 | 20.010870 | 18.0 | 92 | 3.652 | 2.818749 | 2.136863 |
| Electrical & Electronic Engineering | 51.0 | 25.0 | 19.823529 | 18.0 | 51 | 4.000 | 2.701645 | 2.106000 |
| Financial Mathematics | 26.0 | 25.0 | 20.230769 | 18.0 | 26 | 3.975 | 2.706654 | 2.237000 |
| Mechanical Engineering | 46.0 | 25.0 | 19.521739 | 18.0 | 46 | 3.982 | 2.875892 | 2.169000 |
| Physics | 15.0 | 25.0 | 19.600000 | 18.0 | 15 | 3.958 | 2.893067 | 2.260000 |
| Software Engineering | 72.0 | 25.0 | 19.888889 | 18.0 | 72 | 3.976 | 2.854679 | 2.157000 |
| Statistics | 37.0 | 25.0 | 20.162162 | 18.0 | 37 | 3.902 | 2.862865 | 2.108000 |
pv2_all
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| School | ||||||||
| All Saints Secondary School | 22.0 | 24.0 | 20.409091 | 18.0 | 22 | 3.975 | 2.865440 | 2.305000 |
| Batu Pahat High School | 19.0 | 25.0 | 19.842105 | 18.0 | 19 | 3.512 | 2.731835 | 2.136863 |
| Bukit Mertajam High School | 20.0 | 24.0 | 19.650000 | 18.0 | 20 | 3.559 | 2.871800 | 2.360000 |
| Catholic High School, Malaysia | 21.0 | 25.0 | 19.523810 | 18.0 | 21 | 3.982 | 2.872333 | 2.108000 |
| Chinese High School (Batu Pahat) | 24.0 | 25.0 | 20.500000 | 18.0 | 24 | 3.946 | 2.951167 | 2.209000 |
| Foon Yew High School | 19.0 | 25.0 | 21.052632 | 18.0 | 19 | 3.593 | 2.860952 | 2.286000 |
| Kajang High School | 24.0 | 25.0 | 20.333333 | 18.0 | 24 | 3.958 | 2.796125 | 2.165000 |
| Klang High School | 21.0 | 25.0 | 19.476190 | 18.0 | 21 | 3.591 | 2.884952 | 2.157000 |
| Kluang High School | 17.0 | 24.0 | 20.058824 | 18.0 | 17 | 3.454 | 2.819882 | 2.169000 |
| Kota Kinabalu High School | 24.0 | 25.0 | 19.541667 | 18.0 | 24 | 3.914 | 2.765908 | 2.108000 |
| Kuala Selangor High School | 28.0 | 25.0 | 19.607143 | 18.0 | 28 | 3.621 | 2.752485 | 2.176000 |
| Kuching High School | 21.0 | 25.0 | 20.238095 | 18.0 | 21 | 3.578 | 2.757056 | 2.252000 |
| Lake Gardens High School | 27.0 | 25.0 | 20.481481 | 18.0 | 27 | 3.978 | 2.948407 | 2.209000 |
| Malacca Chinese High School | 33.0 | 25.0 | 20.212121 | 18.0 | 33 | 3.812 | 2.820939 | 2.182000 |
| Malacca Girls' High School | 20.0 | 24.0 | 19.850000 | 18.0 | 20 | 3.495 | 2.662333 | 2.247000 |
| Malacca High School | 29.0 | 25.0 | 20.103448 | 18.0 | 29 | 3.718 | 2.727057 | 2.119000 |
| Muar High School | 16.0 | 25.0 | 20.000000 | 18.0 | 16 | 3.521 | 2.814250 | 2.106000 |
| Port Dickson High School | 21.0 | 24.0 | 19.714286 | 18.0 | 21 | 3.583 | 2.738714 | 2.233000 |
| Sabah Chinese High School | 18.0 | 23.0 | 19.777778 | 18.0 | 18 | 3.611 | 2.712970 | 2.133000 |
| Sarikei High School | 24.0 | 25.0 | 20.333333 | 18.0 | 24 | 3.439 | 2.839291 | 2.285000 |
| Segamat High School | 23.0 | 25.0 | 19.869565 | 18.0 | 23 | 3.642 | 2.757430 | 2.254000 |
| Setapak High School | 22.0 | 25.0 | 20.000000 | 18.0 | 22 | 3.785 | 2.747492 | 2.204000 |
| St. David's High School, Malacca | 20.0 | 25.0 | 20.150000 | 18.0 | 20 | 4.000 | 2.948679 | 2.298000 |
pv3_all
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Family_income | ||||||||
| Less than RM1500 | 133.0 | 25.0 | 20.075188 | 18.0 | 133 | 3.914 | 2.757697 | 2.204 |
| Less than RM2500 | 141.0 | 25.0 | 20.007092 | 18.0 | 141 | 4.000 | 2.900345 | 2.182 |
| Less than RM3500 | 114.0 | 25.0 | 20.175439 | 18.0 | 114 | 3.785 | 2.735137 | 2.106 |
| Other | 125.0 | 25.0 | 19.912000 | 18.0 | 125 | 3.978 | 2.837778 | 2.165 |
pv4_all
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Parent_marriage | ||||||||
| divorced | 21.0 | 25.0 | 19.857143 | 18.0 | 21 | 3.202 | 2.729238 | 2.165 |
| married | 487.0 | 25.0 | 20.036961 | 18.0 | 487 | 4.000 | 2.816310 | 2.106 |
| widowed | 5.0 | 25.0 | 21.000000 | 18.0 | 5 | 2.905 | 2.678600 | 2.474 |
pv5_all
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Status | ||||||||
| 100% scholarship | 40.0 | 25.0 | 20.225000 | 18.0 | 40 | 4.000 | 3.703250 | 3.503 |
| 50% scholarship | 114.0 | 25.0 | 20.017544 | 18.0 | 114 | 3.495 | 3.201535 | 3.000 |
| Financial assistance | 105.0 | 25.0 | 20.000000 | 18.0 | 105 | 2.995 | 2.584207 | 2.204 |
| No financial asistance | 254.0 | 25.0 | 20.035433 | 18.0 | 254 | 2.997 | 2.589777 | 2.106 |
pv1_math
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| course | ||||||||
| Applied Mathematics with Computing | 41.0 | 25.0 | 20.731707 | 18.0 | 41 | 3.978 | 2.849000 | 2.176 |
| Financial Mathematics | 26.0 | 25.0 | 20.230769 | 18.0 | 26 | 3.975 | 2.706654 | 2.237 |
| Statistics | 37.0 | 25.0 | 20.162162 | 18.0 | 37 | 3.902 | 2.862865 | 2.108 |
pv2_math
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| School | ||||||||
| All Saints Secondary School | 5.0 | 21.0 | 20.000000 | 19.0 | 5 | 3.975 | 3.296600 | 2.349 |
| Batu Pahat High School | 1.0 | 19.0 | 19.000000 | 19.0 | 1 | 2.549 | 2.549000 | 2.549 |
| Bukit Mertajam High School | 6.0 | 24.0 | 20.333333 | 18.0 | 6 | 2.912 | 2.638000 | 2.366 |
| Catholic High School, Malaysia | 4.0 | 18.0 | 18.000000 | 18.0 | 4 | 3.489 | 2.845250 | 2.313 |
| Chinese High School (Batu Pahat) | 9.0 | 25.0 | 21.222222 | 18.0 | 9 | 3.946 | 3.051222 | 2.237 |
| Foon Yew High School | 3.0 | 25.0 | 21.666667 | 18.0 | 3 | 2.955 | 2.677333 | 2.286 |
| Kajang High School | 5.0 | 25.0 | 20.200000 | 18.0 | 5 | 2.981 | 2.742200 | 2.362 |
| Klang High School | 5.0 | 25.0 | 20.800000 | 18.0 | 5 | 3.323 | 2.975200 | 2.788 |
| Kluang High School | 3.0 | 23.0 | 19.666667 | 18.0 | 3 | 3.123 | 2.821667 | 2.623 |
| Kota Kinabalu High School | 5.0 | 20.0 | 18.600000 | 18.0 | 5 | 3.202 | 2.691600 | 2.108 |
| Kuala Selangor High School | 3.0 | 23.0 | 21.333333 | 19.0 | 3 | 2.670 | 2.383667 | 2.176 |
| Kuching High School | 5.0 | 25.0 | 20.400000 | 18.0 | 5 | 3.068 | 2.585200 | 2.252 |
| Lake Gardens High School | 5.0 | 25.0 | 21.600000 | 18.0 | 5 | 3.978 | 3.338400 | 2.401 |
| Malacca Chinese High School | 5.0 | 22.0 | 19.400000 | 18.0 | 5 | 2.886 | 2.710000 | 2.478 |
| Malacca Girls' High School | 1.0 | 24.0 | 24.000000 | 24.0 | 1 | 2.746 | 2.746000 | 2.746 |
| Malacca High School | 10.0 | 25.0 | 20.300000 | 18.0 | 10 | 3.718 | 2.859300 | 2.400 |
| Muar High School | 4.0 | 25.0 | 20.750000 | 18.0 | 4 | 3.196 | 2.687250 | 2.247 |
| Port Dickson High School | 5.0 | 24.0 | 21.800000 | 18.0 | 5 | 2.795 | 2.493600 | 2.233 |
| Sabah Chinese High School | 4.0 | 21.0 | 20.000000 | 18.0 | 4 | 3.447 | 2.849000 | 2.264 |
| Sarikei High School | 3.0 | 20.0 | 19.000000 | 18.0 | 3 | 2.709 | 2.558667 | 2.285 |
| Segamat High School | 4.0 | 25.0 | 21.500000 | 18.0 | 4 | 3.642 | 3.007750 | 2.285 |
| Setapak High School | 5.0 | 25.0 | 21.000000 | 18.0 | 5 | 3.387 | 2.694600 | 2.204 |
| St. David's High School, Malacca | 4.0 | 22.0 | 19.500000 | 18.0 | 4 | 3.524 | 2.853250 | 2.478 |
pv3_math
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Family_income | ||||||||
| Less than RM1500 | 22.0 | 25.0 | 19.681818 | 18.0 | 22 | 3.524 | 2.657545 | 2.204 |
| Less than RM2500 | 28.0 | 25.0 | 20.642857 | 18.0 | 28 | 3.975 | 2.944464 | 2.233 |
| Less than RM3500 | 25.0 | 25.0 | 20.600000 | 18.0 | 25 | 3.578 | 2.811560 | 2.108 |
| Other | 29.0 | 25.0 | 20.551724 | 18.0 | 29 | 3.978 | 2.824414 | 2.176 |
pv4_math
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Parent_marriage | ||||||||
| divorced | 5.0 | 24.0 | 21.600000 | 19.0 | 5 | 3.202 | 2.85380 | 2.538 |
| married | 98.0 | 25.0 | 20.367347 | 18.0 | 98 | 3.978 | 2.82002 | 2.108 |
| widowed | 1.0 | 18.0 | 18.000000 | 18.0 | 1 | 2.477 | 2.47700 | 2.477 |
pv5_math
| Age | CGPA | |||||||
|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | |
| Status | ||||||||
| 100% scholarship | 10.0 | 25.0 | 20.900000 | 18.0 | 10 | 3.978 | 3.762500 | 3.524 |
| 50% scholarship | 20.0 | 25.0 | 20.000000 | 18.0 | 20 | 3.489 | 3.198800 | 3.000 |
| Financial assistance | 20.0 | 25.0 | 20.000000 | 18.0 | 20 | 2.995 | 2.562300 | 2.204 |
| No financial asistance | 54.0 | 25.0 | 20.611111 | 18.0 | 54 | 2.981 | 2.597426 | 2.108 |
pv6 = pd.pivot_table(df_math, index = ["Family_income", "Parent_marriage"], values = ["Age", "CGPA"],
aggfunc = {"Age": ["count", "max", "mean", "min"], "CGPA": ["count", "max", "mean", "min"]},
fill_value = 0)
pv6 = round(pv6, 3)
pv6
| Age | CGPA | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| count | max | mean | min | count | max | mean | min | ||
| Family_income | Parent_marriage | ||||||||
| Less than RM1500 | divorced | 3 | 23 | 20.667 | 19 | 3 | 3.202 | 2.995 | 2.788 |
| married | 18 | 25 | 19.611 | 18 | 18 | 3.524 | 2.611 | 2.204 | |
| widowed | 1 | 18 | 18.000 | 18 | 1 | 2.477 | 2.477 | 2.477 | |
| Less than RM2500 | divorced | 1 | 24 | 24.000 | 24 | 1 | 2.746 | 2.746 | 2.746 |
| married | 27 | 25 | 20.519 | 18 | 27 | 3.975 | 2.952 | 2.233 | |
| widowed | 0 | 0 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | |
| Less than RM3500 | divorced | 1 | 22 | 22.000 | 22 | 1 | 2.538 | 2.538 | 2.538 |
| married | 24 | 25 | 20.542 | 18 | 24 | 3.578 | 2.823 | 2.108 | |
| widowed | 0 | 0 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | |
| Other | divorced | 0 | 0 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 |
| married | 29 | 25 | 20.552 | 18 | 29 | 3.978 | 2.824 | 2.176 | |
| widowed | 0 | 0 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | |
fig = px.histogram(df_math, x = "Family_income", color = "Parent_marriage", width = 900, height = 370)
fig.update_layout(barmode = "group", title = "<b>Family income</b>", titlefont = {"size": 24})
fig.show()
fig = px.pie(df_math, names = "School")
fig.update_layout(height = 600, width = 800)
fig.show()
fig = px.pie(df, names = "Education")
fig.update_layout(height = 300, width = 700)
fig.show()
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
fig = make_subplots(rows = 3, cols = 3,
specs = [[{}, {}, {}],
[{}, {}, {}],
[{}, {}, {}]],
print_grid = True)
fig.add_trace(go.Histogram(x = df_math["Parent_marriage"], name = "Parent Marriage", marker_color = "#a1a1a1"), row = 1, col = 1)
fig.add_trace(go.Violin(x = df_math["Parent_marriage"], y = df_math["CGPA"]), row = 2, col = 1)
fig.add_trace(go.Box(x = df_math["Parent_marriage"], y = df_math["Age"]), row = 3, col = 1)
fig.add_trace(go.Histogram(x = df_math["Family_income"], name = "Family Income", marker_color = "green"), row = 1, col = 2)
fig.add_trace(go.Violin(x = df_math["Family_income"], y = df_math["CGPA"]), row = 2, col = 2)
fig.add_trace(go.Box(x = df_math["Family_income"], y = df_math["Age"]), row = 3, col = 2)
fig.add_trace(go.Histogram(x = df_math["course"], name = "Course", marker_color = "cyan"), row = 1, col = 3)
fig.add_trace(go.Violin(x = df_math["course"], y = df_math["CGPA"]), row = 2, col = 3)
fig.add_trace(go.Box(x = df_math["course"], y = df_math["Age"]), row = 3, col = 3)
fig.update_xaxes(title_text = "Parent Marriage", showgrid = False, row = 3, col = 1)
fig.update_xaxes(title_text = "Family Income", showgrid = False, row = 3, col = 2)
fig.update_xaxes(title_text = "Course", showgrid = False, row = 3, col = 3)
fig.update_layout(height = 900, width = 980, showlegend = False, title_text = "<b>Data Visualization using Histogram, Violin plot and Box plot",
titlefont = {"size": 28, "family": "Serif"}, paper_bgcolor = "#ededed", plot_bgcolor = "#ededed")
fig.show()
This is the format of your plot grid: [ (1,1) x,y ] [ (1,2) x2,y2 ] [ (1,3) x3,y3 ] [ (2,1) x4,y4 ] [ (2,2) x5,y5 ] [ (2,3) x6,y6 ] [ (3,1) x7,y7 ] [ (3,2) x8,y8 ] [ (3,3) x9,y9 ]
from sklearn.preprocessing import OneHotEncoder
import tensorflow as tf
import warnings
warnings.filterwarnings("ignore")
df_1 = pd.read_csv("~/Desktop/DAV Assignment/testing801.csv")
df_1.drop("Unnamed: 0", 1, inplace = True)
df_1.head()
| CGPA | Family_income | Education | Parent_marriage | Status | |
|---|---|---|---|---|---|
| 0 | 3.143 | Other | STPM | married | 50% scholarship |
| 1 | 3.079 | Other | STPM | married | 50% scholarship |
| 2 | 2.387 | Less than RM3500 | UEC | married | No financial asistance |
| 3 | 2.633 | Less than RM3500 | STPM | married | No financial asistance |
| 4 | 2.995 | Less than RM1500 | UDAC Foundation | divorced | Financial assistance |
df_X = df.loc[:, ["CGPA", "Family_income", "Parent_marriage"]]
df_Y = df.loc[:, ["Status"]]
X1 = OneHotEncoder().fit_transform(df_X.loc[:, ["Family_income", "Parent_marriage"]]).toarray()
temp = df_X.loc[:, "CGPA"]
X2_norm = (temp - np.mean(temp)) / np.std(temp)
temp = temp.to_numpy()
temp = np.reshape(temp, [-1, 1])
temp.shape
(513, 1)
from sklearn.preprocessing import MinMaxScaler, StandardScaler
X2 = StandardScaler().fit_transform(temp)
X3 = MinMaxScaler().fit_transform(temp)
X_train = np.concatenate((X3, X1), 1)
np_Y = df_Y.to_numpy()
Y_train = OneHotEncoder().fit_transform(np_Y).toarray()
X_train.shape, Y_train.shape
((513, 8), (513, 4))
tf.compat.v1.disable_eager_execution()
X = tf.compat.v1.placeholder("float32", [None, 8])
Y = tf.compat.v1.placeholder("float32", [None, 4])
W1 = tf.Variable(tf.compat.v1.random.normal([8, 50]))
b1 = tf.Variable(tf.zeros(50))
h1 = tf.compat.v1.nn.sigmoid(tf.matmul(X, W1) + b1)
W2 = tf.Variable(tf.compat.v1.random.normal([50, 50]))
b2 = tf.Variable(tf.zeros(50))
h2 = tf.compat.v1.nn.sigmoid(tf.matmul(h1, W2) + b2)
W3 = tf.Variable(tf.compat.v1.random.normal([50, 4]))
b3 = tf.Variable(tf.zeros(4))
y_pred = tf.compat.v1.nn.softmax(tf.matmul(h2, W3) + b3)
obj = tf.reduce_sum(-Y*tf.compat.v1.log(y_pred))
opt = tf.compat.v1.train.AdamOptimizer().minimize(obj)
correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess = tf.compat.v1.Session()
sess.run(tf.compat.v1.global_variables_initializer())
batch = 60
n_epoch = 600
for i in range(n_epoch):
for j in range(X_train.shape[0] // batch):
batch_xs, batch_ys = X_train[j*batch:(j+1)*batch, :], Y_train[j*batch:(j+1)*batch]
sess.run(opt, feed_dict = {X: batch_xs, Y: batch_ys})
if (i%30) == 0:
print("Epoch {}, the accuracy is {}".format(i, sess.run(accuracy, feed_dict = {X: X_train, Y: Y_train})))
Epoch 0, the accuracy is 0.4931773841381073 Epoch 30, the accuracy is 0.8050682544708252 Epoch 60, the accuracy is 0.8693957328796387 Epoch 90, the accuracy is 0.9083820581436157 Epoch 120, the accuracy is 0.9356725215911865 Epoch 150, the accuracy is 0.9512670636177063 Epoch 180, the accuracy is 0.9688109159469604 Epoch 210, the accuracy is 0.9785575270652771 Epoch 240, the accuracy is 0.9863547682762146 Epoch 270, the accuracy is 0.9902533888816833 Epoch 300, the accuracy is 0.9922027587890625 Epoch 330, the accuracy is 0.9941520690917969 Epoch 360, the accuracy is 0.9961013793945312 Epoch 390, the accuracy is 0.9980506896972656 Epoch 420, the accuracy is 0.9980506896972656 Epoch 450, the accuracy is 0.9980506896972656 Epoch 480, the accuracy is 0.9980506896972656 Epoch 510, the accuracy is 0.9980506896972656 Epoch 540, the accuracy is 0.9980506896972656 Epoch 570, the accuracy is 1.0
mn = sess.run(y_pred, feed_dict = {X:X_train})
mn2 = np.argmax(mn, 1)
from sklearn.preprocessing import LabelEncoder
mn1 = LabelEncoder().fit_transform(df_Y)
df_11 = pd.DataFrame({"Actual": mn1, "Predict": mn2})
print(((df_11["Actual"] == df_11["Predict"]).sum())/len(df_11))
1.0
dict11 = {0:"100% Scholarship", 1:"50% Scholarship", 2: "Financial assistance", 3: "No financial assistance"}
df_11["Actual Status"] = df_11["Actual"].map(dict11)
df_11["Predicted Status"] = df_11["Predict"].map(dict11)
df_final = df_1.drop("Status", 1)
df_final["Actual Status"] = df_11["Actual Status"]
df_final["Predicted Status"] = df_11["Predicted Status"]
df_final.head(10)
| CGPA | Family_income | Education | Parent_marriage | Actual Status | Predicted Status | |
|---|---|---|---|---|---|---|
| 0 | 3.143000 | Other | STPM | married | 50% Scholarship | 50% Scholarship |
| 1 | 3.079000 | Other | STPM | married | 50% Scholarship | 50% Scholarship |
| 2 | 2.387000 | Less than RM3500 | UEC | married | No financial assistance | No financial assistance |
| 3 | 2.633000 | Less than RM3500 | STPM | married | No financial assistance | No financial assistance |
| 4 | 2.995000 | Less than RM1500 | UDAC Foundation | divorced | Financial assistance | Financial assistance |
| 5 | 2.433000 | Less than RM3500 | UDAC Foundation | married | No financial assistance | No financial assistance |
| 6 | 2.636000 | Less than RM1500 | UEC | married | Financial assistance | Financial assistance |
| 7 | 2.792000 | Less than RM2500 | STPM | married | No financial assistance | No financial assistance |
| 8 | 2.350137 | Less than RM1500 | UDAC Foundation | married | Financial assistance | Financial assistance |
| 9 | 2.412000 | Less than RM1500 | STPM | married | Financial assistance | Financial assistance |
next_df = df.query('course in ["Applied Mathematics with Computing", "Financial Mathematics", "Statistics"]')
next_df.reset_index(inplace = True)
next_df.drop("index", 1, inplace = True)
next_df["Status"] = next_df["Status"].map({"No financial asistance": "No financial assistance",
"Financial assistance": "Financial assistance",
"100% scholarship":"100% scholarship",
"50% scholarship":"50% scholarship"})
print("The Expected Total Number of Scholarship Receivers are {}.".format(len(next_df.query(\
'Status != "No financial assistance"'))))
The Expected Total Number of Scholarship Receivers are 50.
receiver = next_df.query('Status != "No financial assistance"')
receiver["Status"].value_counts()
50% scholarship 20 Financial assistance 20 100% scholarship 10 No financial assistance 0 Name: Status, dtype: int64
The tuition fee for each trimester is between RM3,000 - RM4,000. The expected tuition fee for each semester is RM3,500 <<< (RM4,000+RM3,000)/2. Besides, the expected adminstration fee per year is RM1,750.
The total numbers of applicants are 104, while 50 applicants are eligible for the scholarships fund or financial assistance fund while the rest of 54 applicants were failed to meet the criterias.
There are 10 applicants will receive 100% tuition fee and admin fee waiver scholarship, 20 applicants will receive 50% tuition fee waiver scholarship and 20 applicants will receive zero-percent interest-rate student loans.
"""
I forgot to consider the total fund of RM1,000,000 only. So, if assuming each year has two long semesters, the fund is sufficient to cover 50 successful applicants. And it is the upper bound amount without considering if they fail to meet the scholarship or financial assistance minimum requirements.
"""
def expected_fund_out(a, b, c, not_all = False):
tuition_fee = 3500
admin_fee = 1750
sem = 6
year = 3
expense_a = a*(tuition_fee * sem + admin_fee * year)
expense_b = b*(tuition_fee * sem) * 0.5
expense_c = c*(tuition_fee * sem + admin_fee * year)
if not_all:
return expense_a, expense_b, expense_c
else:
return expense_a + expense_b + expense_c
print("The total expected amount of fund paid out are {}.".format(expected_fund_out(10, 20, 20)))
The total expected amount of fund paid out are 997500.0.
a,b,c = expected_fund_out(10, 20, 20, True)
print("The amount of 100% sholarship are {}, 50% scholarship are {} and zero-interest rate student loans are {}.".format(a,b,c))
The amount of 100% sholarship are 262500, 50% scholarship are 210000.0 and zero-interest rate student loans are 525000.